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Abstract 



It is known that modeling an information source via a symbolic dynamical sys- 
tem evolving over the unit interval, leads to a natural lossless compression scheme 
attaining the entropy rate of the source, under general conditions. We extend this 
notion to the lossy compression regime assuming a feedforward link is available, 
by modeling a source via a two-dimensional symbolic dynamical system where 
one component corresponds to the compressed signal, and the other essentially 
corresponds to the feedforward signal. For memoryless sources and an arbitrary 
bounded distortion measure, we show this approach leads to a family of simple 
deterministic compression schemes that attain the rate-distortion function of the 
source. The construction is dual to a recent optimal scheme for channel coding 
with feedback. 

I Introduction 

Lossless compression of a discrete information source to its entropy rate H is a. well 
studied topic. A possibly lesser known approach to this problem is one based on sym- 
bolic dynamical systems, where the information generating mechanism is modeled by 
a randomly initialized iterative mapping of the unit interval to itself, and the emitted 
source sequence is a quantized observation of that process. For well behaved mappings 
the source sequence constitutes an expansion of the initial point, i.e., corresponds to a 
unique such point. Furthermore, the prefixes of this expansion describe the initial point 
with (exponentially) increasing resolution, and the unit interval can be uniformly parti- 
tioned into ~ 2"^ subintervals so that with high probability, the subinterval containing 
the initial point will have all its points admitting the same length-n expansion. This 
leads to a conceptually simple and optimal compression scheme: A finite source sequence 
is mapped to a representing subinterval by computing the corresponding reverse trajec- 
tory of the dynamical system, and is reconstructed by following the trajectory of an 
arbitrary point in that subintervaQ. A comprehensive study of the symbolic dynamics 
framework for information sources can be found in [1] . Some of the ideas can be traced 
back to Renyi, see [2] and references therein. 



^This has a flavor similar to arithmetic coding and (using variable-length coding) essentially coincides 
with it in some cases, see Example [1] 



In this paper, we extend the concept above to the lossy source coding regime, under 
the assumption that a noiseless feedforward link is available. This setting is described 
as follows: An encoder observes a stochastic source sequence over some product 
alphabet 3^", and maps it to a rate R index set E = {1, 2, . . . , 2"^} using some encoding 
function e : i— t- E. The index is sent to the decoder. At time k, the decoder knows 
the sequence Y'^~^ via the feedforward link, and generates an approximation of using 
a decoding function : E x y'^~^ i— )■ X, where X is the reconstruction alphabet. The 
quality of the approximation is measured w.r.t. a distortion measure d : X x y M+, 
by evaluating the time-averaged expected distortion: 

n 

D = n-'J2^d{Me{Yn,Y'-'),Yk) 

k=l 

The rate- distortion function of the source is the infimum of all rates R for which there 
exist encoding and decoding functions achieving a distortion at most D, for any n 
large enough. It is denoted Rff{D) under the feedforward assumption, and R{D) where 
feedforward is absent (i.e., when restricting fk{^,y^~^) = fk{e)). 

This model has been initially motivated and studied in the context of competitive 
prediction [3], where it was shown that feedforward does not decrease the rate-distortion 
function for a large family of sources (in particular, memoryless). An in-depth analysis 
of the rate-distortion function with feedforward appears in [1]. A simple scheme in- 
spired by a successive error compression feedback coding technique and achieving the 
rate-distortion function for discrete memoryless sources, was suggested in [5]. Another 
optimal protocol building on the Schalkwijk-Kailath scheme for channel coding with 
feedback over the AWGN, was suggested for the white Gaussian source In this pa- 
per, we suggest an alternative approach based in symbolic dynamics and motivated by a 
recent optimal feedback transmission scheme, termed posterior matching p'J[8j[9j. The 
suggested approach yields a conceptually simple compression protocol, which is shown 
to achieve the rate-distortion function for discrete memoryless sources with a bounded 
distortion measure. 

II Preliminaries 

Random variables (r.v's) are denoted by upper-case letters, their realizations by cor- 
responding lower-case letters. A r.v. X (either real or discrete) is associated with a 
probability distribution Px{-) (over M, or over a discrete alphabet A' C N) and we write 
X ~ Px- The cumulative distribution function (c.d.f.) of X is denoted by Fx- We write 
E(-) for expectation and P(-) for the probability of an event within the parentheses. 
H{X) is the entropy of a discrete r.v. X, h{X) is the differential entropy of a continu- 
ous r.v. X, and I{X; Y) is the mutual information between a pair of r.v. X, Y. We use 
I A| for the length of an interval ACM, log for logg, o for function composition, A for the 
closure of the set A, lyi(-) for the indicator function over the set A, X for the open unit 

interval (0, 1), and X2 *= X x X for the open unit square. An open partition of a set A (in 
what follows, X or X2) is a family of disjoint open subsets {v4j} of A, such that UAj = A. 
A sequence x" over a finite alphabet is said to be (strongly) e-typical w.r.t. Px-, if the 
(zero order) empirical distribution of symbols in is e-close to the distribution Px in 
the supremum norm. The set of all such length n sequences is denoted Tn,e{Px)- 

We now turn to define a (two-dimensional) dynamical source, generalizing the defi- 
nition in [T|. Note that in the sequel, we discuss in detail a significantly more restrictive 



family of dynamical sources. We provide the rather abstract definition below both for 
future reference, and as we believe it is more instructive. 
A dynamical source S has the following components: 

• A triplet of alphabets X,y,Z CN. 

• Two open partitions of X into open intervals {n°}ig;t' and {Iljjjg^, and the cor- 
responding product partition Uij = 11° x Ilj of X2. Without loss of generality we 
assume that the intervals are arranged from left to right (or vice versa) according 
to the natural alphabet order. 

• Two functions ao : X ^-^ X , ai : I Z that are equal to i,j over 11°, 11] respec- 
tively. 

• A function C, '■ X x Z y, and its corresponding extension to ( : X2 ^ y that is 
constant and equal to when restricted to Uij. 

• A mapping T : X2 1— )■ X2 of the form 

T{e,<j>) = {To{ex{0A)),T^{e,<p)) (i) 

such that T restricted to each Uij is a continuous bijection, and {T (Jlij)} ^ - j-^^^-i^^j^-^ 
is an open partition of X2 for each k & y. 

Setting (6*1, 0i) G X2 as an initial state, the source S is associated with the following 
sequences, all of which are deterministic functions of the initial state: 

• The state sequence {9°°, 0°°) over X2, recursively defined by (6'„, 0„) = T{9n-i, 0n-i)- 

• The source sequence y°° over the alphabet y, defined by ?/„ = ({On, 4>n) 

• The component sequences x°°, z°° over the alphabets X, Z respectively, defined by 

Furthermore, any finite source sequence corresponds to a fundamental set Un{y") C 
X2, defined to be the set of all initial states {9i, 0i) G X2 that result in the source sequence 
y^. 

Following [1] again, a probabilistic dynamical source is a pair (5, P) where iS is a 
dynamical source, and P is a probability measure equivalent to the Lebesgue measure 
over X2 . Setting (9i, $1) ~ P as the initial state, the source {S, P) is naturally associ- 
ated with the stochastic sequences 0°°, Y"^, all of which are deterministic 
functions of the initial state. 

Ill Lossless Coding 

Let {S,P) be a probabilistic dynamical source with \Z\ = 1, i.e., one dimensional, and 
we can assume X = y. In this case the fundamental sets are simply intervals in X 
(in this section we disregard the redundant dimension). Under some further contrac- 
tion conditions, an asymptotic equipartition property was shown to hold [Ij, namely 
log |P('U„(y"))|~^ tends in probability to the entropy rate 'H(Y°°) of the source 



sequence. This immediately leads to an optimal compression protocol: The unit inter- 
val is uniformly partitioned into ~ 2"'(^"'"'^^ representative intervals. The trajectory of 
the dynamical source is reversed using y", namely recovering the fundamental interval 
Tq""^(-, Yi) o ■ ■ ■ o Tq"^(-, Yn-i) o Tq"^(X, Yn). The index of a representative contained in 
the fundamental interval is used to describe the source sequenc^. To reconstruct F", 
the dynamical source is initialized with any point inside the representative interval. 

Example 1 (Memoryless Sources). To generate a memoryless source over the alphabet 
3^, we set P = Uniform(X), = ^{i) = i, and To(6',i) to be affine and map 11° to 

X. This results in a source sequence that is i.i.d.-Py, where -Py(z) = |n°|. If To{9,i) 
are all monotonically increasing, the fundamental intervals are precisely those generated 
by the simple arithmetic coding protocol for the source, and coding them (the typical 
ones) as described (or alternatively, using a variable-rate code to obtain zero error) 
results in lossless compression with a rate approaching H{Y). Note that in particular 
for Y ~ Uniform(3^), the source sequence is simply the |3^|-base expansion of the initial 
state point. 

Example 2 (The Continued Fraction Source). The continued fraction expansion of a 
number in X can be generated by a dynamical source [1]. In this case we have 3^ = N, 
the open partition is 11° = (l/(z -t- 1), 1/z), = ^{i) = i, and Tq{9, i) = — i = 

(mod 1). Endowing the source with any probability measure P that is equivalent to the 
Lebesgue measure over X, the state process converges to the invariant distribution that 
admits the density /inv(^) = ■ li{9) [10]. 

Coding the fundamental intervals as described results in lossless compression with 
a rate approaching the entropy rate of the continued fraction source, which is given by 
'H{Y°°) = ZL^i£. It is interesting to note that in this case, a more efficient (yet equiv- 
alent) coding mechanism for the fundamental intervals is readily available: Represent a 
finite source sequence by the unique rational number 22. it is the continued fraction 

expansion of. It is well known that for almost all ^ G X (w.r.t. the Lebesgue mea- 
sure), the denominator of the convergents of the continued fraction expansion satisfies 

log Qn — )■ 2-^^ [TU], and so (p„, g„) can be represented at a rate of twice this number, 
which is precisely the entropy rate of the continued fraction source. 



IV Lossy Coding with Feedforward 
IV. 1 Motivation 

In the lossless setting, a finite source sequence was described by efficiently enumerat- 
ing (typical) fundamental sets, obtained via a representation of an initial state up to 
a suitable resolution. In the lossy setting, we wish to provide only partial information 
regarding the fundamental set. To that end, a two-dimensional dynamical source model 
was introduced, where the high-level idea is to provide the decoder with a representation 
of the ^-component of the initial state only. At time k, the decoder knows the sequence 
y^^i (via feedforward), and can therefore compute the ^-component Qk-i that corre- 
sponds to the initial state 0i it was given. This is made possible due to the restriction 
([1]) on the structure of Tq, making its evolution dependent only on the 6'-component and 
the causal knowledge of the source sequence. Had it known the 0-component as well. 



"^If there is no such interval, an arbitrary index is used. This error event is of vanishing probabihty. 



the decoder could have reconstructed X^, and hence Y^. Here, it can only reconstruct 
Xk, which can serve as an estimate for Y^. 

So, our first task is, for a fixed source sequence distribution, to design a probabilistic 
dynamical source [S, P) that is consistent with this distribution, and also makes X", Y"- 
dependent in a prescribed way so that this reconstruction has low distortion. However, 
there is an even more difficult obstacle. The initial ^-component has to be described 
with a finite rate, and (loosely speaking) this should be done while making sure that an 
initial 0-component can be selected so that the statistical dependence above is roughly 
maintained. For memoryless sources, both tasks can be accomplished. 

IV. 2 Memoryless Sources 

Let Py be a probability distribution over the alphabet 3^. There are many different 
probabilistic dynamical sources for which the source sequence is i.i.d.-Py. One simple 
example was given in the previous section, where \Z\ = 1 and T is affine on any H°, 
and corresponds to a lossless compression with rate H{Y). However, in two dimensions 
there is an abundance of distinct probabilistic dynamical sources that admit an i.i.d.-Py 
source sequence. 

Consider any channel Px\y from F to X over the alphabets X x y, let Pxy = 
Py X Px\y be the joint distribution and let Py\x be the corresponding test channel from 
X to Y. The following Lemma is easily observed 

Lemma 1. There exists an alphabet Z of size \Z\ < \X\{\y\ — 1) + 1, a function 
: X X Z y, and a r.v. Z independent of X , such that (X, ^(X, Z)) ~ Pxy- 

Now, let us define the following dynamical source S. The construction is motivated 
by the posterior matching scheme, a capacity achieving feedback transmission scheme 
for memoryless channels with feedback [7] [8] |9] . 

• |n°| = Px{i) for any i G X. 

• |n]| = Pz{i) for any j G Z. 

• The function ^ is that of Lemma [H its natural extension. 

• The mapping T = (To,Ti) is defined as follows: 

— Let Fx\Y be the conditional c.d.f. for Px\y- For any fixed G 3^, Tq{9, k) is a 
continuous non- decreasing function from X onto X, is affine on each H°, and 
is equal to Fx\y{i\k) on the right edge of H°. 

— Ti{6, (f)) = Ti{(f)) is one dimensional, affine on each Hj and maps it onto X. 

Remark. Note that when Px\y is noiseless (e.g., X = Y) then S collapses to the one 
dimensional lossless construction of Example [H 

Lemma 2. For any y G 3^", the fundamental set Un{y^) of the dynamical source S is a 
finite disjoint union of product rectangles. The projections of these rectangles onto the 
9-axis form a set of at most n{\X\ — 1) + 1 distinct intervals. 

Proof. The first assertion follows easily from the affinity of T. For n = 1, the number 
of distinct intervals on the ^-axis is exactly For any fixed k & y, To(a, k) is quasi- 
affine over X as a function of a, with at most lA"! — 1 corner points. Hence the number 
of distinct intervals can increase by at most lA"! — 1 at each step. □ 



The following Lemma is adapted from [9]. 



Lemma 3. Let P ~ Uniform(X2). The probabilistic dynamical source {S,P) has the 
following properties: 

(a) The sequence is i.i.d.-Pz, Zn is statistically independent of X"'. 

(b) {Xn.Yn) ~ PxY , Xn is Statistically independent ofY"^'^ 

(c) The source sequence Y'^ is i.i.d.-Py, and Yn — Xn — X^~^Y^~^ Z^~^ form a Markov 
chain. 

(d) I{ei;Y^)=nI{X;Y) 

Proof. Assertion (a) is immediate: Z°° is a deterministic function of $i and evolves 
according to the memoryless dynamical law described in Example [Tj hence is i.i.d-Pz- 
Furthermore, X"- is a deterministic function of (61, Z"'~^) which are mutually indepen- 
dent of Zn- For the other assertions, see [9]. □ 

Define 

M-) T,-\., k) , Kiy^ Uy,o...o uy„_, o co^Xie, 1 - e)) (2) 

Namely, the interval A^(?/") is obtained by reversing the trajectory of the (edges of 
the) interval (5, 1 — e). The following result, also adapted from is central to our 
derivations. 

Theorem 1. Suppose that Pxy is strictly positive over X x y, and that for any fixed 
6 E I, To{6,j) is not a constant function of j . Then for any e > 0, 

(a) limP(|A^(r")| > 2-"^) = for any R < IiX;Y). 

(b) limP(0i ^ A^(F")) < 2e. 

Loosely speaking. Theorem [1] implies that by observing the source sequence, the 
initial 0i component of the state sequence can be found up to a resolution of 2~'^^^^'^\ 
In a feedback communication setting, this initial value represents a message to be sent 
over the channel Py\x, and this concentration result means that one can reliably transmit 
roug hly 2"^(^'^) such messages and decode them with high reliability, which corresponds 
to a communication rate of at most /(X; Y) bits per channel use. In order to be able to 
generate Xn (channel input) the encoder needs to know the on top of the message 

Gi, hence the feedback. In the dual lossy source coding with feedforward setting we 
consider, 9i plays the role of a lossy description of the source sequence, and we will 
need at least /(X; Y) bits per source symbol to represent it with high enough accuracy. 
In order to be able to generate X„ (lossy reconstruction of Yn) the decoder needs to know 
the on top of the (quantized representation of the) lossy description 9i, hence the 
feedforward. 

Fix the block size n, and set R = I{X; Y) + 6 for some 5 > 0. Let {j7m}m=i be an 
open partition of X into equi-sized intervals, and let be the midpoint of J7m- Denote 
the set of all midpoints by An- 



Lemma 4. lim pf M 0), 0))} n X^Pxz) = 0) = 0. 

eeA„nA^(Y") 
(e,0)G«„(y") 

Proof Outline. For lack of space we only describe the main elements of the proof, skip- 
ping some details. Let V{y"') be the set of indices m such that am G ^niv"'): and J'm 
intersects with two or more intervals that are projections of a product rectangle in Un{y"') 
onto the 0-axis. By Lemma |2l |V^(y"')| < n\X\. Define 

gn(z/") = Pe,|y"( U Jn. I y") 

and consider Theorem [1] with a rate I{X;Y) — ei. Now, assume to the contrary that 
P(g„(F"') > 62) > £3 for some fixed £2 £ (2£:, 1 — 2e),£s > 0, i.e., with probability at 
least £3 the distribution of Gi given has a mass at least 62 inside that polynomial 
sized set of intervals. Then we have (some transitions assuming n large enough) 

n"^/(ei;r") = -n-i/i(ei|F") 

>n-% [£2 log (£2-2"^-(n|A'|)-i) 

+ {l-2e- 82) log ((1 -26- 82) ■ 2"(^(^'^)-^^))] 
+ n-\l - £3)(1 - 2e) log((l - 2e) ■ 2"(^(^'^)-^i)) 

= (1 - 2e) ■ I{X; Y) + ^2^3 - ei{l - 2e - 6263) + 0{hgn/n) (3) 

where we have used the concentration result of Theorem [1] for the inequality transition. 
Since e, ei can be taken arbitrarily small for n large enough, the right-hand-side of ([3]) 
can be made larger than I{X]Y), contradicting Lemma [3l Note that this argument is 
similar in essence to the converse to the channel coding Theorem [12] . 

We conclude that qn{Y^) — )■ in probability, which loosely speaking means that 
with high probability, Pei|y" is mostly concentrated on A'^{y"') \Um£V{y")J'm for large n. 
Using typicality arguments together with the properties in Lemma [3l this can be shown 
to imply that with high probability we can find 6 in that set together with some such 
that (^,0) G M„(y") and 0), 0)) G XAPxz)- By definition, 9 e where 

J'm is a subset of some interval which is a projection of a product rectangle in Un(Y"'). 
This is turn implies that x"(am,0) = a;"(6',0) and z'^{am,4>) = -2"'(^,0), concluding the 
proof. □ 

We are now ready to describe the compression protocol. 
Encoder 

(a) Given the sequence y", compute A^{y"') using the recursion ([2]). 

(b) Out of the ^ 2"'^ intervals Jm C A^(|/"), find the one with the least indexH m, for 
which there exists G X such that (a;"(am, 0), 0)) G Tn,e{Pxz)- If no such 
index exists, arbitrarily set m = 1. 

(c) Send the index m to the decoder, which requires a rate of I{X; Y)+S bits per source 
symbol. 



■^It seems that a random selection should work with high probability, making the process simpler. 
However, this was not verified. 



Decoder 



(a) Initialization: Set 9i = a^, compute Xi = (To{9i). 

(b) For any k, predict yk = Xk- 

(c) Receive the true i/k via the feedforward hnk, compute O^+i = TQ{9k,yiS) and x^+i = 

(d) Repeat steps (jb])-(|cj) up to A; = n. 

The compression rate attained by the scheme is i? = /(X; Y) + 6. If encoding step 
(jb]) is successful then the pair (x", z"') is jointly Pxz-typical, which implies that {x^, y"') 
is jointly Pxy-typical. By Lemma HI when encoding an i.i.d-Py sequence Y"' this occurs 
with probability approaching 1 as n — )■ oo. Since the distortion measure is bounded, the 
expected distortion achieved by the scheme is given by D = Ep^y(i(X, Y) + o(l). 

The development above holds for any Py and Px\y that satisfy the requirements of 
Theorem [H The strict positivity constraint for Pxy has a negligible effect, since such 
distributions can always be approximated arbitrarily via admissible distributions, and 
the distortion measure is bounded. The second constraint is redundant as it can always 
be averted by using a variant of the the probabilistic dynamical source, as in the channel 
coding case |9][T3]. Hence, we have proved the following result. 

Theorem 2. For any discrete memoryless source and bounded distortion measure, the 
protocol described above can perform arbitrarily close to the rate distortion function of 
the source. 

Example 3 (Bernoulli Source and Hamming Distortion). Let X = y = {0,1}, Y ~ 
Bern(|), d{-, ■) the Hamming distortion measure. The rate distortion function Rs{D) = 
R{D) = 1 — hh{D) is achieved by X ~ Bern(|), Z ~ Bern(D) independent of X, and 

Y = X + Z (mod 2). The partitions and mappings are given by 

ng = (o,^) n? = (i,i) 
ni = (o,i-D) Hi = (1-^,1) 

To(^, 0) = 2^(1 - D) ■ l^.{e) + {2De + 1 - 2D) • lno(^^) 
To(^, 1) = 2eD ■ lno{e) + (2(1 - D)e + 2D-1)- Ino(^) 

Ti(0,fc) = Ti(0) = . lni(0) + ~ • lnl(0) 

The mappings and the fundamental sets for n = 3 are depicted in Figures [H and El 

V Conclusions 

A symbolic dynamical system approach to lossy source coding with feedforward was 
introduced, yielding in particular a conceptually simple and optimal compression proto- 
col for memoryless sources. In this latter case, the construction is dual to the posterior 
matching feedback communication scheme for memoryless channels. Future work should 
examine the suggested framework for sources with memory. A reasonable first goal could 
be the case where the 0-component of the dynamical source evolves independently as in 
the memoryless case, yet generates e.g. a Markovian Z". 




Figure 2: Fundamental sets with n = 3 for Bern(i) source, Hamming distortion D 
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